Mining and Classification of Neologisms in Persian Blogs
نویسندگان
چکیده
The exponential growth of the Persian blogosphere and the increased number of neologisms create a major challenge in NLP applications of Persian blogs. This paper describes a method for extracting and classifying newly constructed words and borrowings from Persian blog posts. The analysis of the occurrence of neologisms across five distinct topic categories points to a correspondence between the topic domain and the type of neologism that is most commonly encountered. The results suggest that different approaches should be implemented for the automatic detection and processing of neologisms depending on the domain of application.
منابع مشابه
Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملیک سیستم نوین هوشمند تشخیص هویت نویسنده فارسی زبان بر اساس سبک نوشتاری - مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران
The rapid development of communication by the Internet and the misuse of the anonymity embedded in the nature of online written documents have led to serious security issues. Anonymous identity of the Internet tools such as emails, blogs, and Web sites have made them target methods of interest for criminal activities. On the other hand, world social and political relations have made a great int...
متن کاملExploiting linguistic knowledge to infer properties of neologisms by C . Paul Cook A thesis submitted in conformity with the requirements
Exploiting linguistic knowledge to infer properties of neologisms C. Paul Cook Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2010 Neologisms, or newly-coined words, pose problems for natural language processing (NLP) systems. Due to the recency of their coinage, neologisms are typically not listed in computational lexicons—dictionary-like resources that many...
متن کاملSecond language Writing Through Blogs: An Investigation of Learner Autonomy
Employing an explanatory sequential design, the present study investigated the effect of English as a Foreign Language (EFL) blog-mediated writing instruction on the students’ learner autonomy. A number of 46 learners who were the students of two intact classes were randomly assigned to control and experimental groups. Over a 16-week semester, the control group students (n = 21) were taught ba...
متن کامل